In computer networks, a proxy server is a server (a computer system or an application program) that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource, available from a different server. The proxy server evaluates the request according to its filtering rules. For example, it may filter traffic by IP address or protocol. If the request is validated by the filter, the proxy provides the resource by connecting to the relevant server and requesting the service on behalf of the client. A proxy server may optionally alter the client's request or the server's response, and sometimes it may serve the request without contacting the specified server. In this case, it 'caches' responses from the remote server, and returns subsequent requests for the same content directly.
A proxy server has a large variety of potential purposes, including:
A proxy server that passes requests and replies unmodified is usually called a gateway or sometimes tunneling proxy.
A proxy server can be placed in the user's local computer or at various points between the user and the destination servers on the Internet.
A reverse proxy is (usually) an Internet-facing proxy used as a front-end to control and protect access to a server on a private network, commonly also performing tasks such as load-balancing, authentication, decryption or caching.
Proxy servers implement one or more of the following functions:
A caching proxy server accelerates service requests by retrieving content saved from a previous request made by the same client or even other clients. Caching proxies keep local copies of frequently requested resources, allowing large organizations to significantly reduce their upstream bandwidth usage and cost, while significantly increasing performance. Most ISPs and large businesses have a caching proxy. Caching proxies were the first kind of proxy server.
Some poorly-implemented caching proxies have had downsides (e.g., an inability to use user authentication). Some problems are described in RFC 3143 (Known HTTP Proxy/Caching Problems).
Another important use of the proxy server is to reduce the hardware cost. An organization may have many systems on the same network or under control of a single server, prohibiting the possibility of an individual connection to the Internet for each system. In such a case, the individual systems can be connected to one proxy server, and the proxy server connected to the main server.
A proxy that focuses on World Wide Web traffic is called a "web proxy". The most common use of a web proxy is to serve as a web cache. Most proxy programs provide a means to deny access to URLs specified in a blacklist, thus providing content filtering. This is often used in a corporate, educational, or library environment, and anywhere else where content filtering is desired. Some web proxies reformat web pages for a specific purpose or audience, such as for cell phones and PDAs.Web Server is an intermediate server between client request and server resorces.
A content-filtering web proxy server provides administrative control over the content that may be relayed through the proxy. It is commonly used in both commercial and non-commercial organizations (especially schools) to ensure that Internet usage conforms to acceptable use policy. In some cases users can circumvent the proxy, since there are services designed to proxy information from a filtered website through a non filtered site to allow it through the user's proxy.
Some common methods used for content filtering include: URL or DNS blacklists, URL regex filtering, MIME filtering, or content keyword filtering. Some products have been known to employ content analysis techniques to look for traits commonly used by certain types of content providers.
A content filtering proxy will often support user authentication, to control web access. It also usually produces logs, either to give detailed information about the URLs accessed by specific users, or to monitor bandwidth usage statistics. It may also communicate to daemon-based and/or ICAP-based antivirus software to provide security against virus and other malware by scanning incoming content in real time before it enters the network.
An anonymous proxy server (sometimes called a web proxy) generally attempts to anonymize web surfing. There are different varieties of anonymizers. One of the more common variations is the open proxy. Because they are typically difficult to track, open proxies are especially useful to those seeking online anonymity, from political dissidents to computer criminals. Some users are merely interested in anonymity for added security, hiding their identities from potentially malicious websites for instance, or on principle, to facilitate constitutional human rights of freedom of speech, for instance. The server receives requests from the anonymizing proxy server, and thus does not receive information about the end user's address. However, the requests are not anonymous to the anonymizing proxy server, and so a degree of trust is present between the proxy server and the user. Many of them are funded through a continued advertising link to the user.
Access control: Some proxy servers implement a logon requirement. In large organizations, authorized users must log on to gain access to the web. The organization can thereby track usage to individuals.
Some anonymizing proxy servers may forward data packets with header lines such as HTTP_VIA, HTTP_X_FORWARDED_FOR, or HTTP_FORWARDED, which may reveal the IP address of the client. Other anonymizing proxy servers, known as elite or high anonymity proxies, only include the REMOTE_ADDR header with the IP address of the proxy server, making it appear that the proxy server is the client. A website could still suspect a proxy is being used if the client sends packets which include a cookie from a previous visit that did not use the high anonymity proxy server. Clearing cookies, and possibly the cache, would solve this problem.
Proxies can also be installed in order to eavesdrop upon the data-flow between client machines and the web. All accessed pages, as well as all forms submitted, can be captured and analyzed by the proxy operator. For this reason, passwords to online services (such as webmail and banking) should always be exchanged over a cryptographically secured connection, such as SSL.
An intercepting proxy combines a proxy server with a gateway or router (commonly with NAT capabilities). Connections made by client browsers through the gateway are diverted to the proxy without client-side configuration (or often knowledge). Connections may also be diverted from a SOCKS server or other circuit-level proxies.
Intercepting proxies are also commonly referred to as "transparent" proxies, or "forced" proxies, presumably because the existence of the proxy is transparent to the user, or the user is forced to use the proxy regardless of local settings.
Purpose
Intercepting proxies are commonly used in businesses to prevent avoidance of acceptable use policy, and to ease administrative burden, since no client browser configuration is required. This second reason however is mitigated by features such as Active Directory group policy, or DHCP and automatic proxy detection.
Intercepting proxies are also commonly used by ISPs in some countries to save upstream bandwidth and improve customer response times by caching. This is more common in countries where bandwidth is more limited (e.g. island nations) or must be paid for.
Issues
The diversion / interception of a TCP connection creates several issues. Firstly the original destination IP and port must somehow be communicated to the proxy. This is not always possible (e.g. where the gateway and proxy reside on different hosts). There is a class of cross site attacks which depend on certain behaviour of intercepting proxies that do not check or have access to information about the original (intercepted) destination. This problem can be resolved by using an integrated packet-level and application level appliance or software which is then able to communicate this information between the packet handler and the proxy.
Intercepting also creates problems for HTTP authentication, especially connection-oriented authentication such as NTLM, since the client browser believes it is talking to a server rather than a proxy. This can cause problems where an intercepting proxy requires authentication, then the user connects to a site which also requires authentication.
Finally intercepting connections can cause problems for HTTP caches, since some requests and responses become uncacheble by a shared cache.
Therefore intercepting connections is generally discouraged. However due to the simplicity of deploying such systems, they are in widespread use.
Implementation Methods
Interception can be performed using Cisco's WCCP (Web Cache Control Protocol). This proprietary protocol resides on the router and is configured from the cache, allowing the cache to determine what ports and traffic is sent to it via transparent redirection from the router. This redirection can occur in one of two ways: GRE Tunneling (OSI Layer 3) or MAC rewrites (OSI Layer 2).
Once traffic reaches the proxy machine itself interception is commonly performed with NAT (Network Address Translation). Such setups are invisible to the client browser, but leave the proxy visible to the web server and other devices on the Internet side of the proxy. Recent releases of Linux and some BSD provide TPROXY (Transparent Proxy) which performs IP-level (OSI Layer 3) transparent interception and Spoofing of outbound traffic. Hiding the proxy IP address from other network devices.
Detecting
It is often possible to detect the use of an intercepting proxy server by comparing the client's external IP address to the address seen by an external web server, or sometimes by examining the HTTP headers received by a server. A number of sites have been created to address this issue, by reporting the user's IP address as seen by the site back to the user in a web page.
The term "transparent proxy" is most often used incorrectly to mean "intercepting proxy" (because the client does not need to configure a proxy and cannot directly detect that its requests are being proxied).
However, RFC 2616 (Hypertext Transfer Protocol—HTTP/1.1) offers different definitions:
A security flaw in the way that transparent proxies operate was published by Robert Auger in 2009 [3] and advisory by the Computer Emergency Response Team [4] was issued listing dozens of affected transparent, and intercepting proxy servers.
The term "forced proxy" is used were the user is being force to use the proxy when they would prefer not to. This can be an "intercepting proxy" or a normal configured proxy where the direct (non-proxied) route is being blocked by a packet filter or similar.
It is also sometimes necessary (ie forced) to use a configured proxy due to issues with the interception of TCP connections and HTTP. For instance, interception of HTTP requests can affect the usability of a proxy cache, and can greatly affect certain authentication mechanisms. This is primarily because the client thinks it is talking to a server, and so request headers required by a proxy are unable to be distinguished from headers that may be required by an upstream server (esp authorization headers). Also the HTTP specification prohibits caching of responses where the request contained an authorization header.
A suffix proxy server allows a user to access web content by appending the name of the proxy server to the URL of the requested content (e.g. "en.wikipedia.org.example.com").
Suffix proxy servers are easier to use than regular proxy servers. The concept appeared in 2003 in form of the IPv6Gate and in 2004 in form of the Coral Content Distribution Network, but the term suffix proxy was only coined in October 2008 by "6a.nl".
Because proxies might be used to abuse, system administrators have developed a number of ways to refuse service to open proxies. Many IRC networks automatically test client systems for known types of open proxy. Likewise, an email server may be configured to automatically test e-mail senders for open proxies.
Groups of IRC and electronic mail operators run DNSBLs publishing lists of the IP addresses of known open proxies, such as AHBL, CBL, NJABL, and SORBS.
The ethics of automatically testing clients for open proxies are controversial. Some experts, such as Vernon Schryver, consider such testing to be equivalent to an attacker portscanning the client host. [5] Others consider the client to have solicited the scan by connecting to a server whose terms of service include testing.
The terms "forward proxy" and "forwarding proxy" are a general description of behaviour (forwarding traffic) and thus ambiguous. It is used to refer to a proxy able to retrieve from a wide range of sources (in most cases anywhere on Internet). Except for Reverse proxy the types described on this article are more specialized sub-types of the general forward proxy concept.
A reverse proxy is a proxy server that is installed in the neighborhood of one or more web servers. All traffic coming from the Internet and with a destination of one of the web servers goes through the proxy server. The use of "reverse" originates in its counterpart "forward proxy" since the reverse proxy sits closer to the web server and serves only a restricted set of websites.
There are several reasons for installing reverse proxy servers:
A tunneling proxy server is a method of defeating blocking policies implemented using proxy servers. Tunneling proxy servers are used by people who have been blocked from viewing a particular web site. Most tunneling proxy servers are also proxy servers, of varying degrees of sophistication, which effectively implement "bypass policies".
A tunneling proxy server is a web-based page that takes a site that is blocked and "tunnels" it, allowing the user to view blocked pages. A famous example is elgooG, which allowed users in China to use Google after it had been blocked there. elgooG differs from most tunneling proxy servers in that it circumvents only one block.
A September 2007 report from Citizen Lab and BBC.co.uk recommended secure based proxies HTTP Tunnel, StupidCensorship, and CGIProxy. Alternatively, users could partner with individuals outside the censored network running Psiphon or Peacefire/tunneling proxy server. A more elaborate approach suggested was to run free tunneling software such as FreeGate, or pay services Anonymizer and Ghost Surf. Also listed were free application tunneling software PaperBus, Gpass and HTTP Tunnel, and pay application software Relakks and Guardster. Lastly, anonymous communication networks JAP ANON, Tor, and I2P offer a range of possibilities for secure publication and browsing.[6]
Other options include Garden and GTunnel by Garden Networks.
Students are able to access blocked sites (games, chatrooms, messenger, offensive material, internet pornography, social networking, etc.) through a tunneling proxy server. As fast as the filtering software blocks tunneling proxy servers, others spring up. However, in some cases the filter may still intercept traffic to the tunneling proxy server, thus the person who manages the filter can still see the sites that are being visited.
Another use of a tunneling proxy server is to allow access to country-specific services, so that Internet users from other countries may also make use of them. An example is country-restricted reproduction of media and webcasting.
The use of tunneling proxy servers is usually safe with the exception that tunneling proxy server sites run by an untrusted third party can be run with hidden intentions, such as collecting personal information, and as a result users are typically advised against running personal data such as credit card numbers or passwords through a tunneling proxy server.
In some network configurations, clients attempting to access the proxy server are given different levels of access privilege on the grounds of their computer location or even the MAC address of the network card. However, if one has access to a system with higher access rights, one could use that system as a proxy server for which the other clients use to access the original proxy server, consequently altering their access privileges.
Many work places, schools, and colleges restrict the web sites and online services that are made available in their buildings. This is done either with a specialized proxy, called a content filter (both commercial and free products are available), or by using a cache-extension protocol such as ICAP, that allows plug-in extensions to an open caching architecture.
Requests made to the open internet must first pass through an outbound proxy filter. The web-filtering company provides a database of URL patterns (regular expressions) with associated content attributes. This database is updated weekly by site-wide subscription, much like a virus filter subscription. The administrator instructs the web filter to ban broad classes of content (such as sports, pornography, online shopping, gambling, or social networking). Requests that match a banned URL pattern are rejected immediately.
Assuming the requested URL is acceptable, the content is then fetched by the proxy. At this point a dynamic filter may be applied on the return path. For example, JPEG files could be blocked based on fleshtone matches, or language filters could dynamically detect unwanted language. If the content is rejected then an HTTP fetch error is returned and nothing is cached.
Most web filtering companies use an internet-wide crawling robot that assesses the likelihood that a content is a certain type (e.g. "This content is 70% chance of porn, 40% chance of sports, and 30% chance of news" could be the outcome for one web page). The resultant database is then corrected by manual labor based on complaints or known flaws in the content-matching algorithms.
Web filtering proxies are not able to peer inside secure sockets HTTP transactions, assuming the chain-of-trust of SSL/TLS has not been tampered with. As a result, users wanting to bypass web filtering will typically search the internet for an open and anonymous HTTPS transparent proxy. They will then program their browser to proxy all requests through the web filter to this anonymous proxy. Those requests will be encrypted with https. The web filter cannot distinguish these transactions from, say, a legitimate access to a financial website. Thus, content filters are only effective against unsophisticated users.
As mentioned above, the SSL/TLS chain-of-trust does rely on trusted root certificate authorities; in a workplace setting where the client is managed by the organization, trust might be granted to a root certificate whose private key is known to the proxy. Concretely, a root certificate generated by the proxy is installed into the browser CA list by IT staff. In such scenarios, proxy analysis of the contents of a SSL/TLS transaction becomes possible. The proxy is effectively operating a man-in-the-middle attack, allowed by the client's trust of a root certificate the proxy owns.
A special case of web proxies is "CGI proxies". These are web sites that allow a user to access a site through them. They generally use PHP or CGI to implement the proxy functionality. These types of proxies are frequently used to gain access to web sites blocked by corporate or school proxies. Since they also hide the user's own IP address from the web sites they access through the proxy, they are sometimes also used to gain a degree of anonymity, called "Proxy Avoidance".
In using a proxy server (for example, anonymizing HTTP proxy), all data sent to the service being used (for example, HTTP server in a website) must pass through the proxy server before being sent to the service, mostly in unencrypted form. It is therefore a feasible risk that a malicious proxy server may record everything sent: including unencrypted logins and passwords.
By chaining proxies which do not reveal data about the original requester, it is possible to obfuscate activities from the eyes of the user's destination. However, more traces will be left on the intermediate hops, which could be used or offered up to trace the user's activities. If the policies and administrators of these other proxies are unknown, the user may fall victim to a false sense of security just because those details are out of sight and mind.
In what is more of an inconvenience than a risk, proxy users may find themselves being blocked from certain Web sites, as numerous forums and Web sites block IP addresses from proxies known to have spammed or trolled the site. Proxy bouncing can be used to mantain your privacy.